Skip to content

Refactor: Force all configurations to be fetched from settings.py#179

Open
KelvinLinBU wants to merge 1 commit into
CCI-MOC:mainfrom
KelvinLinBU:refactor-env
Open

Refactor: Force all configurations to be fetched from settings.py#179
KelvinLinBU wants to merge 1 commit into
CCI-MOC:mainfrom
KelvinLinBU:refactor-env

Conversation

@KelvinLinBU

@KelvinLinBU KelvinLinBU commented Apr 22, 2025

Copy link
Copy Markdown
Member

Closes #168. Adhere to testing best practices as outlined in #159 (comment) and #168 (comment). Modify utils.py to accommodate changes.

Edit (04/07/2026): We decided to move on configuration loading to the new settings.py module.

@QuanMPhm QuanMPhm requested review from QuanMPhm and larsks and removed request for QuanMPhm April 22, 2025 14:56
@KelvinLinBU KelvinLinBU self-assigned this May 14, 2025
@KelvinLinBU KelvinLinBU force-pushed the refactor-env branch 3 times, most recently from 63d9226 to 6d1bdef Compare May 29, 2025 16:02
@KelvinLinBU

Copy link
Copy Markdown
Member Author

@QuanMPhm can be merged with main now

- NewPICreditProcessor
"""

chrome_binary_location: str

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this arg below the constants. In this codebase, the conventional structure is class constants, then variables, then functions

chrome_binary_location = os.environ.get(
"CHROME_BIN_PATH", "/usr/bin/chromium"
)
if not os.path.exists(chrome_binary_location):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to move this check to process_report.py. We only want to move the fetching of the env var. Error checking can stay in the invoice class for now.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add CHROME_BIN_PATH in REQUIRED_ENV_VARS

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to perform validation in two different places. If we're using pydantic_settings to manage environment variables, then rather than using required_env_files here, just mark chrome_bin_path as required in process_report.settings.Settings. Currently it is optional:

    chrome_bin_path: str | None = None

To make it required:

    chrome_bin_path: str

This will cause pydantic to throw a validation error when instantiating Settings:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
chrome_bin_path
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing

Similarly, you can remove the check for KEYCLOAK_CLIENT_ID and KEYCLOAK_CLIENT_SECRET here by adding a validator to Settings:

from pydantic import model_validator
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    ...

    
    @model_validator(mode="after")
    def check_keycloak_auth(self):
        if not self.coldfront_api_filepath and not (
            self.keycloak_client_id and self.keycloak_client_secret
        ):
            raise ValueError(
                "You must either set coldfront_api_filepath or provide keycloak credentials in "
                "KEYCLOAK_CLIENT_ID and KEYCLOAK_CLIENT_SECRET"
            )

You probably want to catch the exception from pydantic to produce a more friendly error message that does not include a full Python traceback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the very detailed feedback. I don't know why I didn't thought of getting rid of that hacky function in the first place. Your feedback makes it look obvious

Comment thread process_report/process_report.py Outdated
def main():
"""Remove non-billable PIs and projects"""

chrome_binary_location = os.environ.get("CHROME_BIN_PATH", "/usr/bin/chromium")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the adding CHROME_BIN_PATH to REQUIRED_ENV_VARS, you should have the env var fetched after the env vars have been validated

@QuanMPhm QuanMPhm force-pushed the refactor-env branch 2 times, most recently from 4f89e7d to 0c340e2 Compare July 22, 2025 19:50
@QuanMPhm QuanMPhm requested review from knikolla and naved001 July 24, 2025 15:01
@QuanMPhm QuanMPhm dismissed their stale review August 1, 2025 14:13

I am now a contributor to this PR as well

subprocess.run(
[
CHROME_BIN_PATH,
os.environ.get("CHROME_BIN_PATH", "/usr/bin/chromium"),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for this change? I think it's generally better practice to read your environment variables early, rather than doing it inline like this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've written so that the env var is loaded by the settings module

@QuanMPhm

QuanMPhm commented Mar 15, 2026

Copy link
Copy Markdown
Contributor

@larsks @knikolla After half a year, I want to wrap up this PR

Comment thread .github/workflows/unit-tests.yaml Outdated
name: Run unit tests
runs-on: ubuntu-latest
env:
CHROME_BIN_PATH: /usr/foo/chromium

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting CHROME_BIN_PATH to an invalid path here?

@QuanMPhm QuanMPhm Apr 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests were designed to not require integration with 3rd-party tools. We do have an actual integration test for Chrome that installs Chromium and sets a real path.

As for why the env var is set, rather than not set at all, I wanted to check that code actual read the env var CHROME_BIN_PATH, whatever it maybe.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this. Is there any way you can provide these values through mocking during the tests?

@larsks larsks Apr 2, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To echo what Kristi said: unit tests should never depend on the value of external environment variables. Regardless of whether CHROME_BIN_PATH is set to an invalid value, a valid path, or is unset, the unit tests should behave the same.

You don't even need to use mocking; you can just set environment variables directly in your unit tests

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsks @knikolla Because the pydantic settings fetched the environment variables at import time, I don't believe mocking the env vars within the tests won't solve it. Something like this:

@mock.patch.dict(os.environ, {"CHROME_BIN_PATH": "/usr/foo/chromium"})
class TestInvoice(TestCase):

Or...

def setUp():
  monkeypatch.setenv("CHROME_BIN_PATH", "bar")

Still raises a validation error from Pydantic.

The solutions I found were to use a tool like pytest-env that sets environment variables before import time, to import the settings module (or the modules that use it) in setUp() rather at the top-level of the file, or refactor settings.py so it doesn't fetch env vars at import time, which has bigger upfront work. What are you thoughts?

@larsks larsks Apr 6, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to work around this by setting your environment variables before anything tries to import your settings module:

diff --git a/process_report/tests/integration/test_chromium.py b/process_report/tests/integration/test_chromium.py
index 2e957a8..4bb1995 100644
--- a/process_report/tests/integration/test_chromium.py
+++ b/process_report/tests/integration/test_chromium.py
@@ -3,6 +3,9 @@ from typing import override
 
 import pandas as pd
 
+os.environ["CHROME_BIN_PATH"] = "/foo"
+os.environ["COLDFRONT_API_FILEPATH"] = "/bar"
+
 from process_report.invoices.pi_specific_invoice import PIInvoice
 from process_report.invoices import invoice
 from process_report.tests.base import BaseTestCaseWithTempDir

We have talked before about why it's a good to avoid code that runs at import time, and this is a perfect example of how that complicates testing.

This is a bit janky and would require changes across all of the test files.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using pytest-env would be the best short-term fix.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to perform validation in two different places. If we're using pydantic_settings to manage environment variables, then rather than using required_env_files here, just mark chrome_bin_path as required in process_report.settings.Settings. Currently it is optional:

    chrome_bin_path: str | None = None

To make it required:

    chrome_bin_path: str

This will cause pydantic to throw a validation error when instantiating Settings:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
chrome_bin_path
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing

Similarly, you can remove the check for KEYCLOAK_CLIENT_ID and KEYCLOAK_CLIENT_SECRET here by adding a validator to Settings:

from pydantic import model_validator
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    ...

    
    @model_validator(mode="after")
    def check_keycloak_auth(self):
        if not self.coldfront_api_filepath and not (
            self.keycloak_client_id and self.keycloak_client_secret
        ):
            raise ValueError(
                "You must either set coldfront_api_filepath or provide keycloak credentials in "
                "KEYCLOAK_CLIENT_ID and KEYCLOAK_CLIENT_SECRET"
            )

You probably want to catch the exception from pydantic to produce a more friendly error message that does not include a full Python traceback.

Comment thread process_report/settings.py Outdated
fetch_from_s3: bool = True
upload_to_s3: bool = False

chrome_bin_path: str

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep providing a default as before as you've now made the environment variable required for running all unit tests for no obvious gain.

Comment thread .github/workflows/unit-tests.yaml Outdated
name: Run unit tests
runs-on: ubuntu-latest
env:
CHROME_BIN_PATH: /usr/foo/chromium

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this. Is there any way you can provide these values through mocking during the tests?

@QuanMPhm QuanMPhm changed the title Refactor PIINVOICE so env vars are fetched at start of billing pipeline. Refactor: Force all configurations to be fetched from settings.py Apr 7, 2026
@QuanMPhm QuanMPhm requested a review from knikolla April 7, 2026 19:55
Also added validation for missing environment variables in settings.py
and nice formatting for validation errors

Using pytest-env to set environment variables before pydantic-settings
fetches them at import time. This may be refactored in the future
to avoid import-time fetching
@QuanMPhm

QuanMPhm commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

@knikolla @larsks I've decided to use pytest-env to mock required env vars, currently only COLDFRONT_API_FILEPATH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor invoices so all env vars are fetched at the start of billing pipeline

4 participants